Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Linking sequence-derived microbial taxa abundances to host (patho-)physiology or habitat characteristics in a reproducible and interpretable manner has remained a formidable challenge for the analysis of microbiome survey data. Here, we introduce a flexible probabilistic modeling framework, VI-MIDAS (variational inference for microbiome survey data analysis), that enables joint estimation of context-dependent drivers and broad patterns of associations of microbial taxon abundances from microbiome survey data. VI-MIDAS comprises mechanisms for direct coupling of taxon abundances with covariates and taxa-specific latent coupling, which can incorporate spatio-temporal information and taxon–taxon interactions. We leverage mean-field variational inference for posterior VI-MIDAS model parameter estimation and illustrate model building and analysis using Tara Ocean Expedition survey data. Using VI-MIDAS’ latent embedding model and tools from network analysis, we show that marine microbial communities can be broadly categorized into five modules, including SAR11-, nitrosopumilus-, and alteromondales-dominated communities, each associated with specific environmental and spatiotemporal signatures. VI-MIDAS also finds evidence for largely positive taxon–taxon associations in SAR11 or Rhodospirillales clades, and negative associations with Alteromonadales and Flavobacteriales classes. Our results indicate that VI-MIDAS provides a powerful integrative statistical analysis framework for discovering broad patterns of associations between microbial taxa and context-specific covariate data from microbiome survey data.more » « less
-
Heterotrophic bacteria and archaea (“heteroprokaryotes”) drive global carbon cycling, but how to quantitatively organize their functional complexity remains unclear. We generated a global-scale understanding of marine heteroprokaryotic functional biogeography by synthesizing genetic sequencing data with a mechanistic marine ecosystem model. We incorporated heteroprokaryotic diversity into the trait-based model along two axes: substrate lability and growth strategy. Using genetic sequences along three ocean transects, we compiled 21 heteroprokaryotic guilds and estimated their degree of optimization for rapid growth (copiotrophy). Data and model consistency indicated that gradients in grazing and substrate lability predominantly set biogeographical patterns, and we identified deep-ocean “slow copiotrophs” whose ecological interactions control the surface accumulation of dissolved organic carbon.more » « lessFree, publicly-accessible full text available May 22, 2026
-
Abstract We introduce the Global rRNA Universal Metabarcoding Plankton database (GRUMP), which consists of 1194 samples that were collected from 2003–2020 and cover extensive latitudinal and longitudinal transects, as well as depth profiles in all major ocean basins. DNA from unfractionated (>0.2 µm) seawater samples was amplified using the 515Y/926 R universal three-domain rRNA gene primers, simultaneously quantifying the relative abundance of amplicon sequencing variants (ASVs) from bacteria, archaea, eukaryotic nuclear 18S, and eukaryotic plastid 16S. Thus, the ratio between taxa in one sample is directly comparable to the ratio in any other GRUMP sample, regardless of gene copy number differences. This obviates a problem in prior global studies that used size-fractionation and different rRNA gene primers for bacteria, archaea, and eukaryotes, precluding comparisons across size fractions or domains. On average, bacteria contributed 71%, eukaryotes 19%, and archaea 8% to rRNA gene abundance, though eukaryotes contributed 32% at latitudes >40°. GRUMP is publicly available on the Simons Collaborative Marine Atlas Project (CMAP), promoting the global comparison of marine microbial dynamics.more » « less
-
Gilbert, Jack A. (Ed.)ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data.more » « less
-
Summary Universal primers for SSU rRNA genes allow profiling of natural communities by simultaneously amplifying templates from Bacteria, Archaea, and Eukaryota in a single PCR reaction. Despite the potential to show relative abundance for all rRNA genes, universal primers are rarely used, due to various concerns including amplicon length variation and its effect on bioinformatic pipelines. We thus developed 16S and 18S rRNA mock communities and a bioinformatic pipeline to validate this approach. Using these mocks, we show that universal primers (515Y/926R) outperformed eukaryote‐specific V4 primers in observed versus expected abundance correlations (slope = 0.88 vs. 0.67–0.79), and mock community members with single mismatches to the primer were strongly underestimated (threefold to eightfold). Using field samples, both primers yielded similar 18S beta‐diversity patterns (Mantel test,p < 0.001) but differences in relative proportions of many rarer taxa. To test for length biases, we mixed mock communities (16S + 18S) before PCR and found a twofold underestimation of 18S sequences due to sequencing bias. Correcting for the twofold underestimation, we estimate that, in Southern California field samples (1.2–80 μm), there were averages of 35% 18S, 28% chloroplast 16S, and 37% prokaryote 16S rRNA genes. These data demonstrate the potential for universal primers to generate comprehensive microbiome profiles.more » « less
An official website of the United States government
